if (!requireNamespace("BiocManager", quietly = TRUE)){
install.packages("BiocManager")}
if (!requireNamespace("edgeR", quietly = TRUE)){
BiocManager::install("edgeR")}
if (!requireNamespace("GEOquery", quietly = TRUE)){
BiocManager::install("GEOquery")}
if (!requireNamespace("ComplexHeatmap", quietly = TRUE)){
BiocManager::install("ComplexHeatmap")}
if (!requireNamespace("limma", quietly = TRUE)){
BiocManager::install("limma")}
if (!requireNamespace("ExpressionSet", quietly = TRUE)){
BiocManager::install("ExpressionSet")}
if (!requireNamespace("circlize", quietly = TRUE)){
BiocManager::install("circlize")}
if (!requireNamespace("gprofiler2", quietly = TRUE))
BiocManager::install("gprofiler2")
if (!requireNamespace("kableExtra", quietly = TRUE))
BiocManager::install("kableExtra")
if (!requireNamespace("plotly", quietly = TRUE))
BiocManager::install("plotly")
if (!requireNamespace("dplyr", quietly = TRUE))
BiocManager::install("dplyr")
library(dplyr)
library(plotly)
The normalized data has already been written into a text file by the previous Assignment 1. For better assessment, I have indicated it on the file uploaded.
normalized_count_data <- read.table(file=file.path(getwd(), "normalized_counts_annotated")
, header = TRUE, stringsAsFactors = FALSE, check.names = FALSE )
knitr::kable(normalized_count_data[1:10,1:7], type="pipe")
| hgnc_symbol | YRE1A | YRE1B | YRE2A | YRG1A | YRG1B | YRG2B |
|---|---|---|---|---|---|---|
| A4GALT | 4.345648 | 3.111477 | 2.408097 | 5.369569 | 6.397003 | 6.796014 |
| AAAS | 19.317681 | 22.327817 | 21.792253 | 18.384756 | 17.419577 | 17.311220 |
| AACS | 32.913059 | 24.534108 | 24.466495 | 23.731644 | 17.621289 | 17.428989 |
| AADACP1 | 2.418554 | 4.507231 | 3.916687 | 13.734990 | 14.270060 | 14.143042 |
| AADAT | 4.058510 | 3.995899 | 3.647402 | 4.637361 | 4.052613 | 3.678086 |
| AAED1 | 14.351471 | 11.913866 | 12.242066 | 12.711562 | 8.425186 | 7.790737 |
| AAGAB | 36.922469 | 35.816951 | 39.338927 | 40.358329 | 31.836730 | 30.961949 |
| AAK1 | 1.531217 | 1.364910 | 1.534162 | 2.165459 | 2.677295 | 2.667404 |
| AAMDC | 46.807715 | 44.246410 | 48.924065 | 66.090858 | 49.702345 | 44.178113 |
| AAMP | 82.175802 | 88.352333 | 93.832788 | 76.203567 | 81.078221 | 82.443384 |
I am defining the elements of the normalized data matrix which I will be using for the heatmap later.
The heatmap uses different colors for the expression data. The heatmap has many genes clustered based on their expression at different levels such as control (EA1,EA2,EB1) or the cells treated with GLI2 (G1A,G2A,G1B).
if(min(heatmap_matrix) == 0){
heatmap_col = circlize::colorRamp2(c( 0, max(heatmap_matrix)),
c("white", "purple"))
} else {
heatmap_col = circlize::colorRamp2(c(min(heatmap_matrix), 0,
max(heatmap_matrix)), c("darkgreen", "white", "purple"))
}
first_heatmap <- ComplexHeatmap::Heatmap(as.matrix(heatmap_matrix),
show_row_dend = TRUE, show_column_dend = TRUE,
col=heatmap_col, show_column_names = TRUE,
show_row_names = FALSE, show_heatmap_legend = TRUE)
first_heatmap
### Part 4: SPP1 EMT Genes-Conclusion justification
SPP1 is an EMT
genes It shows higher expression when treated with GLI2 compare to empty
vector control cells (EV). This has been also proved by the authors.
However, indicating this again would strengthen the hypothesis
corresponding to Dox treated cells having GLI2 vector show higher EMT
gene expression as opposed to controls having empty vector showing less
expression of the genes.
GLI2_treatment_samples <- grep(colnames(normalized_count_data),pattern = "\\G")
EV_tratment_samples <- grep(colnames(normalized_count_data),pattern = "\\E")
gene_of_interest <- which(normalized_count_data$Gene_symbol == "SPP1")
YAPC_GLI2_samples <- t(normalized_count_data[gene_of_interest,5:7])
colnames(YAPC_GLI2_samples) <- c("GLI2_vector_cells")
YAPC_EV_samples <- t(normalized_count_data[gene_of_interest,2:4])
colnames(YAPC_EV_samples) <- c("Empty_Vector_cells")
YAPC_GLI2_samples
## GLI2_vector_cells
## YRG1A 321.9614
## YRG1B 607.5096
## YRG2B 580.2982
YAPC_EV_samples
## Empty_Vector_cells
## YRE1A 1.307170
## YRE1B 1.616595
## YRE2A 1.752119
GLI2 treated cells are different in terms of their expression from the empty vector cells. P value (0.03135) < 0.05. Indeed this shows there is significant difference. However, this requires to see if GLI2 treatments show difference among one another so the control cells.
t.test(x=t(YAPC_GLI2_samples), y=t(YAPC_EV_samples))
##
## Welch Two Sample t-test
##
## data: t(YAPC_GLI2_samples) and t(YAPC_EV_samples)
## t = 5.5139, df = 2, p-value = 0.03135
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 110.2125 893.1831
## sample estimates:
## mean of x mean of y
## 503.256412 1.558628
The MDS plot allowed to visualize the difference within the different cell types Earlier examples of MDS plot showed similar results. I hypothesize that GLI2 expression is higher than control cell’s expression levels due to DOX induction of EMT genes. In general I have observed Gli2 cells to be less clustered, due to higher differential expression among GLI2 cells.
limma::plotMDS(heatmap_matrix,col=c(rep("darkgreen",3), rep("blue",3)))
### Part 7: Model Building of GLI2 expression status The model I am
building is based upon GLI2 expression status of genes that are showing
higher expression with significant indications of the differential
expression.
samples <- data.frame(lapply(colnames(normalized_count_data)[2:7],
FUN=function(x){unlist(strsplit(x, split = "[0-9]"))[c(1)]}))
colnames(samples) <- colnames(colnames(normalized_count_data)[2:7])
rownames(samples) <- c("Treatments")
samples <- data.frame(t(samples))
knitr::kable(samples, type = "pipe")
| Treatments |
|---|
| YRE |
| YRE |
| YRE |
| YRG |
| YRG |
| YRG |
YRG_model <- model.matrix( ~ samples$Treatments)
knitr::kable(YRG_model[1:6,], type = "pipe")
| (Intercept) | samples$TreatmentsYRG |
|---|---|
| 1 | 0 |
| 1 | 0 |
| 1 | 0 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
The differential expression data matrix corresponding to gene symbols were merged with the matrix that has the fitted model. The Benjamin-hochberg correction method was applied to adjust the expression data.
expressionMatrix <- as.matrix(normalized_count_data[,2:7])
rownames(expressionMatrix) <- normalized_count_data$Gene_symbol
colnames(expressionMatrix) <- colnames(normalized_count_data)[2:7]
set_expression_matrix <- Biobase::ExpressionSet(assayData = expressionMatrix)
fit <- limma::lmFit(set_expression_matrix, YRG_model)
fit_Bayes <- limma::eBayes(fit, trend = TRUE)
BH_method_fit <- limma::topTable(fit_Bayes,
coef = ncol(YRG_model),
adjust.method = "BH",
number = nrow(expressionMatrix))
merged_hits <- merge(normalized_count_data$Gene_symbol,
BH_method_fit,
by.y=0,by.x=1,
all.y=TRUE)
merged_hits <- merged_hits[order(merged_hits$P.Value),]
colnames(merged_hits) <- c("Gene_symbol","logFC", "AveExpr", "t",
"P.Value","adj.P.Val", "B")
rownames(merged_hits) <- 1:nrow(merged_hits)
knitr::kable(merged_hits[1:10,1:7], type = "pipe", row.names = FALSE,)
| Gene_symbol | logFC | AveExpr | t | P.Value | adj.P.Val | B |
|---|---|---|---|---|---|---|
| S100A2 | 192.67873 | 105.164286 | 63.79569 | 0e+00 | 0.0000245 | 1.820600 |
| SSFA2 | 76.45805 | 75.591839 | 45.17567 | 0e+00 | 0.0000789 | 1.781362 |
| NT5E | 86.27685 | 123.407858 | 42.93449 | 0e+00 | 0.0000789 | 1.773004 |
| S100A13 | 132.80964 | 179.285328 | 37.98876 | 0e+00 | 0.0001191 | 1.749192 |
| TRIB2 | 51.68968 | 38.062060 | 35.69100 | 1e-07 | 0.0001361 | 1.734718 |
| HEG1 | -13.44713 | 8.317468 | -31.24508 | 1e-07 | 0.0002423 | 1.697467 |
| NTN4 | 35.83705 | 33.048568 | 28.42566 | 2e-07 | 0.0003561 | 1.664696 |
| UCP2 | 80.78518 | 94.331418 | 26.66241 | 3e-07 | 0.0004488 | 1.638993 |
| PLLP | -31.42241 | 35.188668 | -25.42904 | 4e-07 | 0.0004683 | 1.617932 |
| UNC5B | 19.59576 | 12.041479 | 24.78963 | 5e-07 | 0.0004683 | 1.605831 |
The p-value was adjusted to have more stringent results. Normally significant p-value is set to 0.05. However, I wanted to be more stringent on my results and indicate the specific EMT genes. Probable genes are 741. However, there are 27 genes that have significant potential to be an EMT gene.
length(which(merged_hits$P.Value < 0.0005))
## [1] 741
length(which(merged_hits$adj.P.Val < 0.0005))
## [1] 27
This part of this assignment aims to detect SPP1’s role on how GLI2 cells are more differently expressed and do have higher difference than the control cells. It is evident that, with more stringent p-value, the significant portion (741) of about 11500 genes are showing features of EMT genes
model_pvalues <- data.frame(Gene_symbol = merged_hits$Gene_symbol,
pvalue = merged_hits$P.Value)
model_pvalues$color <- "black"
model_pvalues$color[model_pvalues$pvalue < 0.0005] <- "orange"
SPP1 <- normalized_count_data$Gene_symbol[which(normalized_count_data$Gene_symbol == "SPP1")]
model_pvalues$color[model_pvalues$Gene_symbol== SPP1] <- "red"
plot(model_pvalues$pvalue,
col = model_pvalues$color,
xlab = "Number of Genes within the Dataset",
ylab ="P-value",
main="P value distribution of EMT Genes")
points(model_pvalues[which(model_pvalues$Gene_symbol == "SPP1"), 1:2],
pch = 20, col="red", cex=1.5)
legend(0, 1, legend=c("SPP1"), fill=c("red"))
### Part 11: Clustered Heatmap with More Stringent Values With more
stringent gene size I was able to see cleaner difference between GLI2
and the control cells. The clustered genes are the symmetric each other
across the diagonals. The high expression showing genes on the GLI2’s
side is lower on the EV’ side. Top hits are assessed on the stringent
p-value of 0.0005. The difference between the first and the latest
heatmap shows there is significant difference in terms of gene
expression whose expression is represented for each respective
Treatments
top_hits <- merged_hits$Gene_symbol[merged_hits$P.Value < 0.0005]
heatmap_matrix_tophits <- t(scale(t(heatmap_matrix[which(rownames(heatmap_matrix)
%in% top_hits),])))
if(min(heatmap_matrix_tophits) == 0){
heatmap_col = circlize::colorRamp2(c( 0, max(heatmap_matrix_tophits)),
c( "white", "purple"))
} else{
heatmap_col = circlize::colorRamp2(c(min(heatmap_matrix_tophits), 0,
max(heatmap_matrix_tophits)), c("darkgreen", "white", "purple"))
}
latest_heatmap <- ComplexHeatmap::Heatmap(as.matrix(heatmap_matrix_tophits),
show_row_dend = TRUE, show_column_dend = TRUE, col=heatmap_col,
show_column_names = TRUE, show_row_names = FALSE, show_heatmap_legend = TRUE)
latest_heatmap
first_heatmap
### Part 12: Dataset Filtering and Data Extracting The previously
extracted GSE131222 data from gene omnibus expression (GEO) is extracted
separately. The low count portions of the data filtered and a new matrix
was created.
data = GEOquery::getGEOSuppFiles("GSE131222")
datanames = rownames(data)
GLI2_exp =read.delim(datanames[1],header=TRUE,check.names = FALSE)
count_per_ms = edgeR::cpm(GLI2_exp[,3:8])
rownames(count_per_ms) <- GLI2_exp[,2]
rows_filtered = rowSums(count_per_ms >1) >= 6
GLI2_exp_filtered = GLI2_exp[rows_filtered,]
filtered_GLI2_matrix <- as.matrix(GLI2_exp_filtered[,3:8])
rownames(filtered_GLI2_matrix) <- GLI2_exp_filtered$gene
head(GLI2_exp_filtered)[1:8]
## gene locus YRE1A YRE1B YRE2A YRG1A
## 7 LINC01128 chr1:762970-794826 3.65922 3.95950 4.17393 3.89816
## 9 KLHL17 chr1:895966-901099 8.94838 10.89430 11.20550 6.99464
## 10 PLEKHN1 chr1:901876-910484 4.90645 5.28867 5.02146 4.31839
## 11 ISG15 chr1:948846-949919 322.87300 251.30200 287.47100 328.22400
## 12 AGRN chr1:955502-991499 198.02100 221.32500 221.63500 163.25300
## 18 B3GALT6 chr1:1167628-1170420 10.17300 12.26130 12.14880 11.34150
## YRG1B YRG2B
## 7 3.33866 3.57704
## 9 10.45200 10.27520
## 10 4.29991 4.14435
## 11 562.14200 586.97600
## 12 214.04600 219.62400
## 18 12.56020 11.78190
The similar grouping was done previously. However this time dispersion values, normalization factors and differential expression model fitting is applied to the samples of GLI2 treatments.
grouped_treatments <- data.frame(lapply(colnames(GLI2_exp_filtered)[3:8],
FUN=function(x){unlist(strsplit(x, split = "[0-9]"))[c(1)]}))
colnames(grouped_treatments) <- colnames(GLI2_exp)[3:8]
rownames(grouped_treatments) <- c("cell_type")
grouped_treatments_normalized <- data.frame(t(grouped_treatments))
d = edgeR::DGEList(counts=filtered_GLI2_matrix, group = grouped_treatments_normalized$cell_type)
model_design_celltypes <- model.matrix(~grouped_treatments_normalized$cell_type+0)
d <- edgeR::estimateDisp(d, model_design_celltypes)
d <- edgeR::calcNormFactors(d)
fit <- edgeR::glmQLFit(d, model_design_celltypes)
qlf_test.GLI2vsEV <- edgeR::glmQLFTest(fit, coef = "grouped_treatments_normalized$cell_typeYRG")
With more stringent p-value I was able to assess the critical threshold for the expression threshold. I set the p-value as 0.0005. I decided to put it by adjusting and observing any difference. The difference between p-value 0.005 and p-value 0.0005 is one. This means this p-value is safe to assume as a threshold for significance among these differentially expressing genes.
qlf_top_hits <- edgeR::topTags(qlf_test.GLI2vsEV,sort.by = "PValue", n = nrow(filtered_GLI2_matrix))
head(qlf_top_hits)
## Coefficient: grouped_treatments_normalized$cell_typeYRG
## logFC logCPM F PValue FDR
## ADAM15 -14.66671 5.427443 1252636.1 8.490905e-20 3.69538e-16
## LAMB1 -14.63332 5.416059 1201380.1 9.840550e-20 3.69538e-16
## RRAGA -14.43536 5.426032 1033729.1 1.672979e-19 3.69538e-16
## SMAP2 -14.84240 5.427397 999721.4 1.882720e-19 3.69538e-16
## DDX18 -14.60997 5.420251 920508.0 2.519831e-19 3.69538e-16
## CASC4 -14.51764 5.430442 856813.2 3.245837e-19 3.69538e-16
length(which(qlf_top_hits$table$PValue < 0.0005))
## [1] 2434
length(which(qlf_top_hits$table$FDR < 0.0005))
## [1] 2433
length(which(qlf_top_hits$table$logFC < 0))
## [1] 11527
The threshold was calculated and based its value I assessed the how many genes actually show traits similar to GLI2. I used P value 1 to eliminate significant portion of the differentially expressed genes. I did not use the LogFC to distinguish between EMT GLI2 vs non-EMT GLI2, because all of logFC values are negative. This indicates there is always up regulation after the Dox treatment and GLI2 expression is always higher than the control. Pretty high confidence.
qlf_tophits_withgn <- merge(GLI2_exp[,1:1], qlf_top_hits, by.y=0,by.x=1,all.y=TRUE)
colnames(qlf_tophits_withgn) <- c("Gene_symbol", "logFC", "logCPM","F", "PValue", "FDR")
qlf_tophits_withgn[,"logarithmic"] <- (-log(qlf_tophits_withgn$PValue))
qlf_tophits_withgn <- qlf_tophits_withgn[order(qlf_tophits_withgn$Gene_symbol),]
GLI2_emt_Genes <- qlf_tophits_withgn$Gene_symbol[which(qlf_tophits_withgn$PValue < 0.0005
& qlf_tophits_withgn$logFC < 0)]
GLI2_non_emtgenes <- qlf_tophits_withgn$Gene_symbol[which(qlf_tophits_withgn$PValue == 1)]
write.table(x=GLI2_emt_Genes, file = file.path(getwd(),"data","GLI2_emt_Genes.txt"),
sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
write.table(x=GLI2_non_emtgenes, file = file.path(getwd(),"data","GLI2_non_emtgenes.txt"),
sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
genename <- qlf_tophits_withgn$Gene_symbol[18:nrow(qlf_tophits_withgn)]
genename <- unlist(strsplit(genename, split = ","))
genename <- data.frame(genename)
qlf_tophits_withgn <- merge(genename$genename,qlf_tophits_withgn[,7], by.y=0,by.x=0,all.y=TRUE)
qlf_tophits_withgn <- qlf_tophits_withgn[,2:3]
colnames(qlf_tophits_withgn)[1] <- "Genename"
colnames(qlf_tophits_withgn)[2] <- "Logarithmic"
write.table(x=qlf_tophits_withgn,file=file.path("data","logarithmic_genelist.rnk"),
sep = "\t",append=FALSE,row.names = FALSE,col.names = FALSE,quote = FALSE)
The g:Profiler analysis includes BH correction methods (FDR). I sticked using the same method because I used the same type of method on my previous analysis on FDR portion. For both upregualted and the downregulated genes I have used GO:BP (2022-12-04), Reactome, and WikiPathways( for annotation. These annotations allows for better asessment of my Upregualted genes in terms of defining the biologic pathways they are invovled in.
Upregulated_genes <- read.table(file=file.path(getwd(),"data" ,"GLI2_emt_Genes.txt"),header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
Upregulated_genes <- Upregulated_genes[4:nrow(Upregulated_genes),]
Upregulated_Gprofiler_validated <- gprofiler2::gost(query = Upregulated_genes,
organism = "hsapiens",
exclude_iea = TRUE,
sources = c("GO:BP", "REAC", "WP"),
correction_method = "fdr",
ordered_quer = FALSE,
)
Downregulated_genes <- read.table(file=file.path(getwd(),"data" ,"GLI2_non_emtgenes.txt"),header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
Downregulated_genes <- Downregulated_genes[13:nrow(Downregulated_genes),]
Downregulated_Gprofiler_validated <- gprofiler2::gost(query = Downregulated_genes,
organism = "hsapiens",
exclude_iea = TRUE,
correction_method = "fdr",
ordered_quer = FALSE,
source = c("GO:BP", "REAC", "WP"))
After subsetting I do have 771 genes that are down regulated. Down regualted genes are invovled in common biological pathways that are not involved in cancerous activities.
Downregulated_genes_filtered <- data.frame(
term_name = Downregulated_Gprofiler_validated$result$term_name[Downregulated_Gprofiler_validated$result$term_size < 200 & Downregulated_Gprofiler_validated$result$term_size > 1],
term_id = Downregulated_Gprofiler_validated$result$term_id[Downregulated_Gprofiler_validated$result$term_size < 200 &
Downregulated_Gprofiler_validated$result$term_size > 1],
source = Downregulated_Gprofiler_validated$result$source[Downregulated_Gprofiler_validated$result$term_size < 200 &
Downregulated_Gprofiler_validated$result$term_size > 1]
)
length(Downregulated_genes_filtered$term_name)
## [1] 771
knitr::kable(Downregulated_genes_filtered$term_name[1:50], format = "html")
| x |
|---|
| tRNA metabolic process |
| cell cycle checkpoint signaling |
| DNA-templated transcription elongation |
| tRNA processing |
| regulation of DNA-templated transcription elongation |
| DNA-templated DNA replication |
| DNA integrity checkpoint signaling |
| tRNA modification |
| regulation of transcription elongation by RNA polymerase II |
| transcription elongation by RNA polymerase II |
| RNA modification |
| negative regulation of mitotic cell cycle |
| DNA damage checkpoint signaling |
| phosphatidylinositol biosynthetic process |
| signal transduction in response to DNA damage |
| mitotic cell cycle checkpoint signaling |
| protein acetylation |
| peptidyl-lysine acetylation |
| glycerophospholipid biosynthetic process |
| phosphatidylinositol metabolic process |
| negative regulation of mitotic cell cycle phase transition |
| histone acetylation |
| positive regulation of transcription elongation by RNA polymerase II |
| rRNA processing |
| internal peptidyl-lysine acetylation |
| microtubule organizing center organization |
| internal protein amino acid acetylation |
| autophagosome organization |
| mitotic DNA integrity checkpoint signaling |
| G1/S transition of mitotic cell cycle |
| nucleic acid phosphodiester bond hydrolysis |
| centrosome cycle |
| positive regulation of DNA-templated transcription, elongation |
| autophagosome assembly |
| RNA phosphodiester bond hydrolysis |
| tRNA methylation |
| protein deacetylation |
| protein deacylation |
| macromolecule deacylation |
| mitotic DNA damage checkpoint signaling |
| DNA-templated DNA replication maintenance of fidelity |
| histone deacetylation |
| RNA methylation |
| histone H3 acetylation |
| regulation of DNA repair |
| double-strand break repair via homologous recombination |
| post-Golgi vesicle-mediated transport |
| recombinational repair |
| vacuole organization |
| positive regulation of transcription initiation by RNA polymerase II |
After subsetting I do have 1128 genes that are down regulated. Up regualted genes are mostly invovled in cancer activity. Most of the pathwyas are quite invovled in cancerous activty. There is a lot apoptosis and programmed cell death activities.
Upregulated_genes_filtered <- data.frame(
term_name = Upregulated_Gprofiler_validated$result$term_name[Upregulated_Gprofiler_validated$result$term_size < 200 & Upregulated_Gprofiler_validated$result$term_size > 1],
term_id = Upregulated_Gprofiler_validated$result$term_id[Upregulated_Gprofiler_validated$result$term_size < 200 &
Upregulated_Gprofiler_validated$result$term_size > 1],
source = Upregulated_Gprofiler_validated$result$source[Upregulated_Gprofiler_validated$result$term_size < 200 &
Upregulated_Gprofiler_validated$result$term_size > 1]
)
length(Upregulated_genes_filtered$term_name)
## [1] 1128
knitr::kable(Upregulated_Gprofiler_validated$result$term_name[1:50], format = "html")
| x |
|---|
| organonitrogen compound biosynthetic process |
| peptide metabolic process |
| translation |
| peptide biosynthetic process |
| organonitrogen compound metabolic process |
| amide metabolic process |
| amide biosynthetic process |
| cytoplasmic translation |
| cellular macromolecule metabolic process |
| cellular macromolecule biosynthetic process |
| protein metabolic process |
| metabolic process |
| cellular metabolic process |
| ATP synthesis coupled electron transport |
| mitochondrial ATP synthesis coupled electron transport |
| nitrogen compound metabolic process |
| respiratory electron transport chain |
| aerobic respiration |
| regulation of protein metabolic process |
| catabolic process |
| electron transport chain |
| aerobic electron transport chain |
| cellular respiration |
| oxidative phosphorylation |
| organic substance metabolic process |
| primary metabolic process |
| macromolecule catabolic process |
| protein localization |
| cellular macromolecule localization |
| macromolecule localization |
| organic substance catabolic process |
| protein catabolic process |
| establishment of protein localization |
| cellular catabolic process |
| programmed cell death |
| cell death |
| ribonucleoprotein complex biogenesis |
| cellular localization |
| organonitrogen compound catabolic process |
| apoptotic process |
| protein-containing complex organization |
| protein-containing complex assembly |
| intracellular transport |
| regulation of programmed cell death |
| generation of precursor metabolites and energy |
| energy derivation by oxidation of organic compounds |
| protein transport |
| cellular component biogenesis |
| ribose phosphate biosynthetic process |
| regulation of cell death |
gprofiler2::gostplot(Upregulated_Gprofiler_validated) %>% plotly::layout(title = "Upregulated genes ", font = list(size = 10))
gprofiler2::gostplot(Downregulated_Gprofiler_validated) %>% plotly::layout(title = "Downregulated genes ", font = list(size = 10))
Complete_Gprofiler_Analyses <- gprofiler2::gost(query = merged_hits$Gene_symbol,
organism = "hsapiens",
exclude_iea = TRUE,
correction_method = "fdr",
sources = c("GO:BP", "REAC", "WP"))
Complete_Gprofiler_Analyses_filtered <- data.frame(term_name = Complete_Gprofiler_Analyses$result$term_name[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1],
term_id = Complete_Gprofiler_Analyses$result$term_id[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1],
source = Complete_Gprofiler_Analyses$result$source[Complete_Gprofiler_Analyses$result$term_size < 200 &
Complete_Gprofiler_Analyses$result$term_size > 1])
length(Complete_Gprofiler_Analyses_filtered$term_name)
## [1] 1927
knitr::kable(Complete_Gprofiler_Analyses$result$term_name[1:100], format = "html")
| x |
|---|
| cellular metabolic process |
| primary metabolic process |
| metabolic process |
| cellular macromolecule metabolic process |
| nitrogen compound metabolic process |
| organic substance metabolic process |
| macromolecule metabolic process |
| cellular nitrogen compound metabolic process |
| organonitrogen compound metabolic process |
| protein metabolic process |
| heterocycle metabolic process |
| nucleobase-containing compound metabolic process |
| cellular biosynthetic process |
| cellular response to stress |
| organic substance biosynthetic process |
| cellular aromatic compound metabolic process |
| biosynthetic process |
| organic cyclic compound metabolic process |
| intracellular transport |
| nucleic acid metabolic process |
| macromolecule modification |
| cellular catabolic process |
| catabolic process |
| establishment of localization in cell |
| macromolecule catabolic process |
| macromolecule biosynthetic process |
| protein modification process |
| cellular macromolecule catabolic process |
| cellular nitrogen compound biosynthetic process |
| RNA processing |
| cellular localization |
| gene expression |
| organonitrogen compound biosynthetic process |
| cellular macromolecule biosynthetic process |
| organelle organization |
| organic substance catabolic process |
| RNA metabolic process |
| cellular process |
| cellular response to DNA damage stimulus |
| protein localization to organelle |
| regulation of cellular metabolic process |
| translation |
| cellular macromolecule localization |
| regulation of nitrogen compound metabolic process |
| protein localization |
| mitotic cell cycle |
| peptide biosynthetic process |
| amide biosynthetic process |
| regulation of primary metabolic process |
| macromolecule localization |
| protein catabolic process |
| ribonucleoprotein complex biogenesis |
| mitotic cell cycle process |
| proteolysis involved in protein catabolic process |
| peptide metabolic process |
| modification-dependent macromolecule catabolic process |
| cellular component organization or biogenesis |
| amide metabolic process |
| modification-dependent protein catabolic process |
| intracellular protein transport |
| organonitrogen compound catabolic process |
| cell cycle |
| ubiquitin-dependent protein catabolic process |
| DNA metabolic process |
| regulation of metabolic process |
| regulation of cell cycle |
| mRNA metabolic process |
| positive regulation of nitrogen compound metabolic process |
| regulation of protein metabolic process |
| establishment of protein localization |
| regulation of macromolecule metabolic process |
| cellular component biogenesis |
| positive regulation of cellular metabolic process |
| positive regulation of nucleobase-containing compound metabolic process |
| ribosome biogenesis |
| protein transport |
| DNA repair |
| cell cycle process |
| cellular component organization |
| regulation of catabolic process |
| positive regulation of metabolic process |
| proteasomal protein catabolic process |
| ncRNA metabolic process |
| proteasome-mediated ubiquitin-dependent protein catabolic process |
| positive regulation of macromolecule biosynthetic process |
| positive regulation of cellular biosynthetic process |
| positive regulation of macromolecule metabolic process |
| protein modification by small protein conjugation or removal |
| regulation of cellular catabolic process |
| positive regulation of biosynthetic process |
| heterocycle biosynthetic process |
| regulation of macromolecule biosynthetic process |
| RNA splicing |
| process utilizing autophagic mechanism |
| autophagy |
| protein modification by small protein conjugation |
| regulation of cellular biosynthetic process |
| regulation of organelle organization |
| organic cyclic compound biosynthetic process |
| establishment of protein localization to organelle |
gprofiler2::gostplot(Complete_Gprofiler_Analyses) %>% plotly::layout(title = "Complete plot plot", font = list(size = 10))
Yes the over-representation results we find is correlated to the conclusion the authors make. They conclude that genes having higher GLI2 expression, more differentially expressed, are more possible to be EMT genes. Additionally, we found there is always higher expression on GLI2 treatments given this is a strong oncogene acting in basal-sub-type switching triggering EMT switching. As it is obvious from the visualization the upregualted genes are more clustered and are much more in the downregualted genes This supports the hypothesis of authors, concluding the EMT genes to be higher expression than that of the downregualted genes.
The paper (Pasca di Magliano et al.,2006) has already indicated increase expression of GLI2 can lead to SHH gene reduction. This gene has been known to be an oncogene associated with basal-sub-type switching in PDAC. They activate GLI2 constantly which has the same expected outcome in our data set having GLI2 expressing vectors with constant Dox treatment. In conclusion these two paper findings correlate each other. They show that the EMT genes to be abundant in their biological functions and their pathway invovlement.